VHD(x)

What is VHD(x)?

The Virtual Hard Disk (VHD) format is a publicly available image format specification that allows encapsulation of the hard disk into an individual file for use by the operating system as a virtual disk in all the same ways physical hard disks are used. These virtual disks are capable of hosting native file systems (NTFS, FAT, exFAT, and UDFS) while supporting standard disk and file operations. VHD API support allows management of virtual disks. Virtual disks created with the VHD API can function as boot disks.

VHD is designed for use by virtual machines and VHDs are usually installed on virtual machines. VHD was introduced with Microsoft’s acquisition of Connectix and their Virtual PC product in 2003. The VHD Specification was released as an open specification in 2006. A draft specification (v0.95) was released for VHDX format in April 2012, proposing VHDX (sometimes called Virtual Hard Disk v2) as the successor to VHD. Subsequently finalized, the latest and previous specifications of VHDX are published by Microsoft and are the default used by the Microsoft Hyper-V hypervisor since 2012. Citrix’s XenServer also made heavy use of the VHD format.

Other Virtual Disk formats are available and in use, notably the proprietary VMware VMDK (Virtual Machine Disk) format and the Oracle VDI (Virtual Disk Image) format which is the default disk format for the open-source Oracle VM VirtualBox.


The VHD format – how VHD represents a disk, types of VHD

VHD represents a disk as a file consisting of

  • A footer,
  • the space for data,
  • and possibly a header

There are three types of Virtual Hard Disk Image types defined:

  • FIXED
  • DYNAMIC
  • DIFFERENCING

Hypervisors such as Hyper-V and XenServer make heavy use of DYNAMIC and DIFFERENCING VHDs.


VHD types – FIXED, DYNAMIC, DIFFERENCING

VHDX builds on VHD by modifying the formats slightly but inherits the basic format from the VHD specification to define three basic disk types. The original VHD disk types were defined thus:

FIXED

The VHD consumes a fixed amount of space on the host machine hard disk drive. The size of the virtual hard disk does not change with addition or removal of data. Fixed VHDs support fast indexing and processing speeds and constant fragmentation. A small amount of space is associated with the footer so a 2GB VHD occupies ~2GB with

  • Space for data (fixed)
  • A footer (fixed)

The maximum size is limited by the host file system e.g., FAT32, max. size = 4GB.

DYNAMIC

This VHD format has a varying disk size. The storage space occupied starts at a particular minimum size and grows as data is added to the VHD i.e., it is a “File that at any given time is as least as large as the actual data written to it plus the header and footer”. The format consists of:

  • Space for data
  • A footer
  • A header

A 2GB VHD is initially 2MB. When a block is added the footer is moved. Critical data and the footer are also mirrored into the header.

The format of a Dynamic VHD disk

DIFFERENCING

Same structure as a dynamic disk. A DIFFERENCING disk has to be associated with a parent DYNAMIC disk. Some bit mask fields e.g., sector bitmaps have an overloaded meaning. This type of disk represents the current state of the virtual hard disk as a set of modified blocks in comparison to the parent virtual hard disk file.


The limitations of VHD disks – Why VHDX was created

VHDX was introduced to overcome certain limitations with VHD that had arisen as data sets grew, storage arrays changed in specification and the industry demanded faster read/write times and increased resilience against data failure. In summary VHD disks had:

  • Max. size of differencing/dynamic disk 2TB, limited by the 32bit BAT table
  • 512KB Sectors – this led to slower access times as data blocks were usually aligned to a 512 byte boundary rather than a 4k sector boundary. Newer storage arrays have 4k sectors.
  • Default 2MB data blocks
  • Dynamic/differencing disks expand as data gets written but never shrink if it’s deleted!
  • Limited protection against and indicators of corruption:
    • Copy of the footer in dynamic/differencing disk header
    • Checksum field
    • Parent disk date stamps checked
  • Big endian fields
  • Limited metadata capacity

VHDX - the advantages over VHD

The initial VHDX specification improved on VHD with:

  • Dynamic disks up to 64TB
  • Logical sectors up to 4KB
  • Block sizes up to 256MB (Min.=1MB, Max.=256MB, must be a power of 2)
  • Log for resilience to corruption from e.g., Power Failures
  • Mechanism to attach small pieces of user data (metadata)
  • Ability to use unmap / TRIM. Disks are able to shrink when data is deleted. Its file size dynamically increases as data is added or removed from it. The ability to compact VHD(X) files has become very important in workflows such as where they are used for FSLogix profiles
  • A format designed to be extensible
  • Improved alignment of the virtual hard disk format to work well on large sector disks
  • All multi-byte values are stored in little endian format (Microsoft), which makes it easier to write a parser

VHDX structures – a deep dive

VHDX supports the same 3 types of disk as VHD did, namely: FIXED, DYNAMIC and DIFFERENCING. All 3 types share the same logical and physical layout:

Header Section

  • File type identifier - never written to, identifies the file as VHDX regardless of other corruptions
  • TWO headers
    • Only one active at any one time. Each contains:
      • Location of the log
      • Basic file metadata
      • Some checksum data
    • Region Table
    • The BAT (Block Allocation Table)
    • Metadata region
    • Extensible (Parser guidelines exist to ensure this)

The Log

The addition of the log introduced significant resilience and traceability into the VHDX format above that offered by the original VHD specification. Key characteristics and benefits of the VHDX log include:

  • Variable sized
  • Contiguous ring buffer
  • Pointed to by the header
  • If not empty when a file is opened, the data in the log must be rewritten
  • Each entry has at least 4k alignment, may be greater so writes isolated on host disk

Blocks 

  • Payload blocks (Min.=1MB, Max.=256MB, must be a power of 2)
  • Sector bitmap blocks

BAT entries, Sector bitmap entries, and Metadata region updates must be written through the log. However, payload data blocks are not written through the log. As the header section locates the log, updates to the header section cannot go through the log.


VHD(X) DIFFERENCING disks are very useful

DIFFERENCING disks can be made based on DYNAMIC disks. A DIFFERENCING disk can then be made with a parent DIFFERENCING disk.

Once you associate a DIFFERENCING disk with a parent DYNAMIC disk

  • The parent should not change
  • The time stamps are used to check the parent hasn’t changed
  • Writes will go to the latest child DIFFERENCING disk
  • Reads can be performed from either the parent disk or any in a chain of parent differencing disks
  • Only the active, latest child DIFFERENCING disk can be written to Differencing VHD (Virtual Hard Disk) disks are heavily used in virtualization environments in scenarios such as:
    • creating multiple virtual machines with similar configurations but with slight variations or modifications - (master and golden image workflows).
    • In snapshotting workflows around backup or VM migration.

VHD(X) - The Block Allocation Table (BAT)

BAT is a region consisting of a single array of 64-bit values, with an entry for each block that determines the state and file offset of that block. The entries for the payload block and sector bitmap block are interleaved in a way that the sector bitmap block entry associated with a chunk follows the entries for the payload blocks in that chunk. 

The BAT layout is the same for all types of VHDX, whether fixed, dynamic, or differencing. However, in a fixed or dynamic VHDX, the sector bitmap blocks will not be allocated. In a dynamic VHDX all sectors are contained within the file. Because the sector bitmap blocks are used to determine whether a parent VHDX file contains payload data, they are unnecessary in a dynamic VHDX. 

The BAT is a single 64bit (8 byte) array

struct VHDX_BAT_ENTRY {

UINT64     State:3;

UINT64     Reserved:17;

UINT64     FileOffsetMB:44;

};

The BAT indexes Payload blocks or Sector bitmap blocks. The “State” field contains an enumerator e.g., PAYLOAD_BLOCK_NOT_PRESENT, PAYLOAD_BLOCK_UNMAPPED, PAYLOAD_BLOCK_FULLY_PRESENT (or others) or SB_BLOCK_NOT_PRESENT

Chunk Ratio

The Chunk Ratio is how many payload blocks indexed by a sector block:

The BAT for a Chunk Ration = 4 will look like:

Note: Sector Bitmap blocks exist for dynamic VHDX but are never allocated, this makes conversion from dynamic to differencing disks simpler.

Sector Blocks

A sector bitmap block locates the sector bitmap. Sector bitmaps are always 1MB in size (223 bit mask). Each bit in a sector bitmap defines whether the virtual sector is present in this disk:

  • 1 = Retrieve sector from this file
  • 0 = Retrieve sector from parent disk

Sectors have LogicalSectorSize: 4KB or 512 bytes.